Basic Statistics

Raw Counts

Name Value
Rows 40,313
Columns 16
Discrete columns 9
Continuous columns 7
All missing columns 0
Missing observations 29,442
Complete Rows 10,883
Total observations 645,008
Memory allocation 14 Mb

Percentages

Data Structure

Missing Data Profile

Univariate Distribution

Histogram

Bar Chart (with frequency)

## 8 columns ignored with more than 50 categories.
## comment_id: 28623 categories
## video_id: 52 categories
## video_title: 52 categories
## username: 22719 categories
## comment: 27458 categories
## comment_published_at: 28424 categories
## video_published_at: 52 categories
## parent_comment_id: 2968 categories

QQ Plot

Correlation Analysis

## Warning in dummify(data, maxcat = maxcat): Ignored all discrete features since `maxcat` set to 20
## categories!
## Warning in cor(x = structure(list(subscribers = c(1330000, 1330000, 1330000, : the standard
## deviation is zero
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_text()`).

Principal Component Analysis

## 5 features with more than 50 categories ignored!
## comment_id: 6717 categories
## username: 4058 categories
## comment: 6289 categories
## comment_published_at: 6704 categories
## parent_comment_id: 2966 categories
## Warning in plot_prcomp(data = structure(list(comment_id = c("UgytAJQNq333bPLo3aV4AaABAg.9CiM5OxPGHm9CicVKkQOF7", : The following features are dropped due to zero variance:
##  * replies